approximate computing
Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability
Panteleaki, Aikaterini Maria, Balaskas, Konstantinos, Zervakis, Georgios, Amrouch, Hussam, Anagnostopoulos, Iraklis
--As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabrication processes. In this work, we propose a carbon-efficient design methodology for 3D DNN accelerators, leveraging approximate computing and genetic algorithm-based design space exploration to optimize Carbon Delay Product (CDP). By integrating area-efficient approximate multipliers into Multiply-Accumulate (MAC) units, our approach effectively reduces silicon area and fabrication overhead while maintaining high computational accuracy. Experimental evaluations across three technology nodes (45nm, 14nm, and 7nm) show that our method reduces embodied carbon by up to 30% with negligible accuracy drop. The rapid growth of Artificial Intelligence (AI) has resulted in the wide adoption of Deep Neural Networks (DNNs) as a fundamental component of modern computing systems. To efficiently support the computational demands of DNNs, specialized hardware accelerators have been developed, offering significant improvements in throughput and energy efficiency. These accelerators have enabled AI deployment across a wide range of environments, from large-scale data centers to resource-constrained edge devices.
LibAMM: Empirical Insights into Approximate Computing for Accelerating Matrix Multiplication
Matrix multiplication (MM) is pivotal in fields from deep learning to scientific computing, driving the quest for improved computational efficiency. Accelerating MM encompasses strategies like complexity reduction, parallel and distributed computing, hardware acceleration, and approximate computing techniques, namely AMM algorithms. Amidst growing concerns over the resource demands of large language models (LLMs), AMM has garnered renewed focus. However, understanding the nuances that govern AMM's effectiveness remains incomplete. This study delves into AMM by examining algorithmic strategies, operational specifics, dataset characteristics, and their application in real-world tasks.
Mixed-Precision Over-The-Air Federated Learning via Approximated Computing
Yuan, Jinsheng, Wei, Zhuangkun, Guo, Weisi
Over-the-Air Federated Learning (OTA-FL) has been extensively investigated as a privacy-preserving distributed learning mechanism. Realistic systems will see FL clients with diverse size, weight, and power configurations. A critical research gap in existing OTA-FL research is the assumption of homogeneous client computational bit precision. Indeed, many clients may exploit approximate computing (AxC) where bit precisions are adjusted for energy and computational efficiency. The dynamic distribution of bit precision updates amongst FL clients poses an open challenge for OTA-FL, as is is incompatible in the wireless modulation superposition space. Here, we propose an AxC-based OTA-FL framework of clients with multiple precisions, demonstrating the following innovations: (i) optimize the quantization-performance trade-off for both server and clients within the constraints of varying edge computing capabilities and learning accuracy requirements, and (ii) develop heterogeneous gradient resolution OTA-FL modulation schemes to ensure compatibility with physical layer OTA aggregation. Our findings indicate that we can design modulation schemes that enable AxC based OTA-FL, which can achieve 50\% faster and smoother server convergence and a performance enhancement for the lowest precision clients compared to a homogeneous precision approach. This demonstrates the great potential of our AxC-based OTA-FL approach in heterogeneous edge computing environments.
Reservoir Computing Using Measurement-Controlled Quantum Dynamics
Abbas, A. H., Maksymov, Ivan S.
Modern digital computers can solve virtually any computational problem. However, to accomplish a computational task of arbitrary complexity, they may require impracticably large resources such as time and memory. To resolve this challenge, unconventional [1,2] and neuromorphic [3-10] computing were proposed as the new methods of computer engineering, where elements of a computer mimic the operation of a biological brain relying on physical and chemical processes [11,12]. While neuromorphic computers may not be as universal as the traditional digital ones, they can solve certain practically important problems with feasible accuracy using just a small amount of computational resources and energy needed by a high-performance computer tasked with the same problem. Neuromorphic computers are also inherently scalable, parallel and allow for collocation of data processing and memory [9]. Similarly to a biological brain, they also operate only when input data are available and mimic the randomness of the firing of biological neurons, thus helping save energy and decrease the overall cost of computations [13,14].
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Leon, Vasileios, Hanif, Muhammad Abdullah, Armeniakos, Giorgos, Jiao, Xun, Shafique, Muhammad, Pekmestzi, Kiamal, Soudris, Dimitrios
The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.
Training Neural Networks for Execution on Approximate Hardware
Li, Tianmu, Li, Shurui, Gupta, Puneet
Approximate computing methods have shown great potential for deep learning. Due to the reduced hardware costs, these methods are especially suitable for inference tasks on battery-operated devices that are constrained by their power budget. However, approximate computing hasn't reached its full potential due to the lack of work on training methods. In this work, we discuss training methods for approximate hardware. We demonstrate how training needs to be specialized for approximate hardware, and propose methods to speed up the training process by up to 18X.
Approximate Computing and the Efficient Machine Learning Expedition
Henkel, Jörg, Li, Hai, Raghunathan, Anand, Tahoori, Mehdi B., Venkataramani, Swagath, Yang, Xiaoxuan, Zervakis, Georgios
Approximate computing Approximate computing (AxC) has been long accepted as a design refers to techniques that exploit the inherent error resilience alternative for efficient system implementation at the cost of relaxed of several applications to achieve improvements in efficiency (e.g., accuracy requirements. Despite the AxC research activities energy and performance) at all layers of the computing stack [60]. in various application domains, AxC thrived the past decade when For example, prior analysis on a benchmark suite of 12 recognition, it was applied in Machine Learning (ML). The by definition approximate mining and search applications showed that 83% of the runtime is notion of ML models but also the increased computational spent in tasks that are amenable to approximation [15, 60]. The origins overheads associated with ML applications-that were effectively of approximate computing (AxC) can be traced back to various mitigated by corresponding approximations-led to a perfect matching fields including computer arithmetic (floating point representation) and a fruitful synergy. AxC for AI/ML has transcended beyond [63], arithmetic units (adders [54] and multipliers [80]), digital academic prototypes. In this work, we enlighten the synergistic signal processing (filter design) [27], algorithms (approximation nature of AxC and ML and elucidate the impact of AxC in designing algorithms) [62], and networking (best-effort packet delivery) [9].
Cross-Layer Approximation For Printed Machine Learning Circuits
Armeniakos, Giorgos, Zervakis, Georgios, Soudris, Dimitrios, Tahoori, Mehdi B., Henkel, Jörg
Printed electronics (PE) feature low non-recurring engineering costs and low per unit-area fabrication costs, enabling thus extremely low-cost and on-demand hardware. Such low-cost fabrication allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. However, even with bespoke architectures, the large feature sizes in PE constraint the complexity of the ML models that can be implemented. In this work, we bring together, for the first time, approximate computing and PE design targeting to enable complex ML models, such as Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), in PE. To this end, we propose and implement a cross-layer approximation, tailored for bespoke ML architectures. At the algorithmic level we apply a hardware-driven coefficient approximation of the ML model and at the circuit level we apply a netlist pruning through a full search exploration. In our extensive experimental evaluation we consider 14 MLPs and SVMs and evaluate more than 4300 approximate and exact designs. Our results demonstrate that our cross approximation delivers Pareto optimal designs that, compared to the state-of-the-art exact designs, feature 47% and 44% average area and power reduction, respectively, and less than 1% accuracy loss.
Towards a Next Generation Computing Paradigm: Approximate Computing in Robotics Systems and Environment-Experimentation, Case Study and Practical Implications
Approximate computing is a computation domain which can be used to trade time and energy with quality and therefore is useful in embedded systems. Energy is the prime resource in battery-driven embedded systems, like robots. Approximate computing can be used as a technique to generate approximate version of the control functionalities of a robot, enabling it to ration energy for computation at the cost of degraded quality. Usually, the programmer of the function specifies the extent of degradation that is safe for the overall safety of the system. However, in a collaborative environment, where several sub-systems co-exist and some of the functionality of each of them have been approximated, the safety of the overall system may be compromised. In this paper, we consider multiple identical robots operate in a warehouse, and the path planning function of the robot is approximated. Although the planned paths are safe for individual robots (i.e. they do not collide with the racks), we show that this leads to a collision among the robots. So, a controlled approximation needs to be carried out in such situations to harness the full power of this new paradigm if it needs to be a mainstream paradigm in future.
IBM Tech Trends To Watch In 2020 … And Beyond
In the decade now drawing to a close, much of the world's business and everyday life became fully digital. But innovation is far from over. The 2020s will see further refinement of established technologies and the practical rollout of new modes, like quantum computing, that are still at the experimental stage. In December 2029, we'll no doubt be remarking on inventions that today we still can't imagine. But for now, here is IBM's preview of the year and decade ahead.